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Abstract 


We  establish  an  optimal  on-line  relationship  between  tree  machines 
and  random  access  machines  (RAMs).  We  present  an  on-line  sim¬ 
ulation  of  a  tree  machine  of  time  complexity  t  by  a  log-cost  RAM 
of  time  complexity  O((tlogt)/loglog£).  Using  information-theoretic 
techniques,  we  show  that  this  simulation  is  optimal. 

We  adapt  the  simulation  of  a  tree  machine  to  devise  an  on-line 
simulation  of  a  d-dimensional  Turing  machine  of  time  complexity  t  by 
a  log-cost  RAM  running  in  time  0(t(log  t)1-1/,<i(loglog  t)llfd). 
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1  Introduction 


The  random  access  machine  (RAM)  and  the  Turing  machine  (TM)  are  the 
standard  models  for  sequential  computation.  Research  into  the  use  of  time 
and  space  by  these  and  other  models  gives  us  insight  into  their  computational 
power.  This  research  includes  analyzing  how  two  different  models  use  time 
and  space,  and  comparing  time  and  space  utilization  within  a  single  model. 
Another  avenue  of  investigation  is  determining  how  altering  the  definitions  of 
time  and  space  (for  example,  log-cost  versus  unit-cost)  for  a  model  affects  its 
computational  power.  Slot  and  van  Emde  Boas  (1988),  for  example,  showed 
how  space  equivalence  of  RAMs  and  Turing  machines  is  affected  by  varying 
the  definition  of  space  complexity  for  RAMs. 

Paul  and  Reischuk  (1981)  used  tree  machines  to  investigate  the  relation¬ 
ships  between  time  and  space  for  random  access  machines  and  multidimen¬ 
sional  Turing  machines.  They  presented  a  simulation  of  a  log-cost  RAM  of 
time  complexity  t  by  a  tree  machine  of  time  complexity  O(t).  They  also 
showed  that  a  tree  machine  of  time  complexity  t  can  be  simulated  off-line 
by  a  unit-cost  RAM  of  time  complexity  0(t/ log  log  t).  Loui  (1984b)  showed 
that  a  multihead  tree  machine  of  time  complexity  t  can  be  simulated  by  a 
tree  machine  with  only  two  worktape  heads  in  time  0((t  log  t)/  log  log  t). 

*  We  present  an  on-line  simulation  of  a  tree  machine  of  time  complexity  t 
by  a  log-cost  RAM  of  time  complexity  0((t  log  t)/  log  log  t).  Using  the  notion 
of  incompressibility  from  Kolmogorov  complexity  (Li  and  Vitanyi,  1988),  we 
show  that  this  simulation  is  optimal.  This  appears  to  be  the  first  application 
of  Kolmogorov  complexity  to  sequential  RAMs.  It  is  significant  because  few 
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algorithms  have  been  shown  to  be  optimal. 

Using  similar  techniques,  we  design  an  efficient  on-line  simulation  of  a 
d-dimensional  Turing  machine  of  time  complexity  t  by  a  log-cost  RAM  run¬ 
ning  in  time  O(t(\og  £)1_  1/rf(Iog  log  t)l/d).  For  d  =  1,  the  running  time  is 
O(tloglogt),  which  is  the  same  as  the  result  of  Katajainen  et  al.  (1988). 

This  work  is  a  complement  to  Loui’s  (1983)  simulation  of  tree  machines 
by  multidimensional  Turing  machines  and  Reischuk’s  (1982)  simulation  of 
multidimensional  Turing  machines  by  tree  machines,  v 

All  logarithms  in  this  paper  are  taken  to  base  2. 

2  Machine  Definitions 

All  machines  that  we  consider  have  a  two-way  read-only  input  tape  and  a 
one-way  write-only  output  tape.  The  principal  differences  in  the  machines 
are  in  their  storage  structures. 

A  tree  machine ,  a  generalization  of  a  Turing  machine,  has  a  storage  struc¬ 
ture  that  consists  of  a  finite  collection  of  complete  infinite  rooted  binary  trees, 
called  tree  worktapes.  Each  cell  of  a  worktape  can  store  a  0  or  1.  Each  work- 
tape  has  one  head.  A  worktape  head  can  shift  to  a  cell’s  parent  or  to  its  left 
or  right  child.  Initially,  every  worktape  head  is  on  the  root  of  its  worktape, 
and  all  cells  contain  0. 

Let  W  be  a  tree  worktape.  We  fix  a  natural  bi jection  between  the  positive 
integers  and  cells  of  W .  We  refer  to  the  integer  corresponding  to  a  particular 
cell  as  that  cell's  location.  Write  cell(  6)  for  the  cell  at  location  b.  Define  cell(  1 ) 
as  the  root  of  IV.  Then  cell(2/>)  is  the  left  child  of  cell(6)  and  ce  11(26  +  1)  is 
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the  right  child  of  cell(6). 

Each  step  of  a  tree  machine  consists  of  reading  the  contents  of  the  work- 
tape  cells  and  input  cell  currently  scanned,  writing  back  on  the  same  work- 
tape  cells  and  (possibly)  to  the  currently  accessed  output  cell,  and  (possibly) 
shifting  each  worktape  head  and  the  input  head.  If  the  tree  machine  writes 
on  the  output  tape,  it  also  shifts  the  output  head. 

The  time  complexity  t(n )  of  a  tree  machine  is  defined  in  the  natural  way. 

A  multihead  d-dimensiona.i  Turing  machine  consists  of  a  finite  control 
and  a  finite  number  of  d-dimensional  worktapes,  each  with  one  worktape 
head.  A  d-dimensional  worktape  comprises  an  infinite  number  of  cells,  each 
of  which  is  assigned  a  d-tuple  of  integers  called  the  coordinates  of  the  cell. 
The  coordinates  of  adjacent  cells  differ  in  just  one  component  of  the  d-tuple 
by  ±1.  At  each  step  of  the  computation,  the  machine  reads  the  symbols  in 
the  currently  accessed  input  and  worktape  cells,  (possibly)  writes  symbols  on 
the  currently  accessed  output  and  worktape  cells,  (possibly)  shifts  the  input 
head,  and  shifts  each  worktape  head  in  one  of  2d  +  1  directions  -  either  to 
one  of  2d  adjacent  cells  or  to  the  same  cell. 

The  random  access  machine  (RAM)  (Aho  et  al .,  1974;  Cook  and  Reckhow, 
1973;  Katajainen  et  al.,  1988)  consists  of  the  following:  a  finite  sequence  of 
labeled  instructions;  a  memory  consisting  of  an  infinite  sequence  of  registers, 
indexed  by  nonnegative  integer  addresses  (register  r(j)  has  address  j);  and 
a  special  register  AC,  called  the  accumulator,  used  for  operating  on  data. 
Each  register,  including  AC,  holds  a  nonnegative  integer;  initially  all  registers 
contain  0.  Each  cell  on  the  input  tape  contains  a  0  or  1.  The  following  RAM 
instructions  are  allowed  (( x )  denotes  the  contents  of  register  r(x);  (AC) 
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denotes  the  contents  of  AC): 


input.  Read  the  current  input  symbol  into  AC  and  move  the  input  head 
one  cell  to  the  right. 

output.  Write  (AC)  to  the  output  tape  and  move  output  head  one  cell 
to  the  right. 

jump  6.  Unconditional  transfer  of  control  to  instruction  labeled  0. 

jgtz  6.  Transfer  control  to  instruction  labeled  0  if  (AC)  >  0. 

load  =C.  Load  integer  C  into  AC. 

load  j.  Load  { j )  into  AC. 

load  *j.  (Load  indirect)  Load  ((j))  into  AC. 

store  j.  Store  (AC)  into  r(j). 

store  *j.  (Store  indirect)  Store  (AC)  into  register  r((j)). 
add  j.  Add  (j)  to  (AC)  and  place  result  in  AC. 

sub  j.  If  (j)  >  (AC),  then  load  0  into  AC;  otherwise,  subtract  (j)  from 
(AC)  and  place  result  in  AC. 

Define  the  length  of  a  nonnegative  integer  i  as  the  minimum  positive 
integer  w  such  that  i  <  2W  —  1  (approximately  the  logarithm  of  i).  The 
length  of  a  register  is  the  length  of  the  integer  contained  in  the  register  (note 
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that  the  length  of  a  register  is  a  timc-dejx 

We  consider  two  time  complexity  moas 
of  each  RAM  instruction.  For  the  unit-co>‘ 
one  unit  of  time.  For  the  log-cost  RAM ,  \v< 
to  the  logarithmic  cost  criterion  (Katajain 
instruction  is  the  sum  of  the  lengths  of  1 1 
contents)  involved  in  its  execution.  The 
the  maximum  total  time  used  in  comput, 
possible,  of  course,  to  define  time  compl< 
charge  some  other  function  f(j)  for  acc< 
1987). 

In  our  simulations,  we  group  the  regi> 
ories,  each  memory  containing  an  infxnit . 
not  increase  the  cost  in  time  by  more  thai 
simply  interleave  these  memories  into  one 

We  use  a  technique  of  Katajainen  (t 
registers  in  order  to  find  the  bit  represent 
This  divide-and-conquer  strategy  involves 

Lemma  2.1  (Katajainen  et  al.,  19S8)  If  I 
it  is  possible  to  compute  the  u-bit  represen  ■■ 
numeric  value  of  a  u-bit  string,  both  in  O 

Lemma  2.2  (Katajainen  et  al .,  1988)  77- 
can  be  built  in  0(u2u)  time  on  a  log-cost  I 
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A  machine  M  of  time  complexity  t  is  simulated  by  a  machine  M'  on-line 
in  time  f(t)  if  for  every  time  step  s,  where  M  reads/writes  a  symbol,  there 
is  a  corresponding  time  step  s'  where  M'  reads/ writes  the  same  symbol,  and 

sj  <  /(s,). 

3  Simulation  of  a  Tree  Machine 

3.1  Upper  Bound 

It  is  straightforward  to  simulate  a  tree  machine  with  a  log-cost  RAM  in  time 
0(t  logt).  In  fact,  such  a  simulation  is  used  in  Theorem  3.2  to  show  that  a 
tree  machine  can  be  simulated  by  a  unit-cost  RAM  in  real  time.  However, 
we  can  do  better  than  the  straightforward  simulation  for  log-cost  RAMs. 

For  simplicity,  we  consider  tree  machines  with  only  one  worktape,  but  our 
results  generalize  to  -multiple  worktapes.  Let  T  be  a  tree  machine  of  time 
complexity  t  with  one  worktape  IV.  We  show  that  there  is  a  RAM  R  that 
simulates  T  on-line  in  time  0((<  log  <)/  log  log  t). 

We  first  provide  a  brief  description  of  the  simulation.  We  choose  parame¬ 
ters  h  and  u  such  that  u  =  22h+2  —  1.  We  specify  the  values  of  h  and  u  later. 
As  noted  earlier,  R  has  several  memories.  R  maintains  in  the  main  memory 
the  entire  contents  of  IV.  The  main  memory  represents  IV  as  overlapping 
subtrees,  called  blocks.  R  represents  the  contents  of  each  block  IVX  in  one 
register  rx  of  the  main  memory.  When  the  worktape  head  is  in  a  particular 
block  Wx,  R  represents  IVX  in  the  cache  memory.  Step-by-step  simulation 
is  carried  out  in  the  cache,  which  represents  the  block  IVX  in  breadth-first 
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order,  one  cell  of  Wx  per  register  of  the  cache. 

Because  blocks  overlap,  when  the  worktape  head  exits  \VX ,  it  is  positioned 
in  the  middle  of  some  other  block  Wy.  At  this  time  R  packs  the  contents  of 
the  cache  back  into  rx  in  the  main  memory  and  unpacks  the  contents  of  ry 
into  the  cache. 

The  details  of  the  simulation  follow. 

Let  M'[x,.s]  be  the  complete  subtree  of  IT  of  height  s  rooted  at  cell(x). 
A  block  is  any  subtree  \VX  =  \V[x,2h  4-  1]  such  that  the  depth  of  cell(x)  is  a 
multiple  of  h  +  1.  Since  a  block  has  height  2 h  +  1,  it  contains  22h+2  —  1  =  u 
cells.  Let  the  relative  location  of  a  cell  within  a  block  be  defined  in  a  manner 
similar  to  the  location  of  a  cell,  where  the  relative  location  of  the  root  of  the 
block  is  1,  the  relative  locations  of  its  children  are  2  and  3,  and  so  on. 

Call  a  block  Wp  the  parent  block  of  H'x  if  cell(p)  is  the  ancestor  of  cell(x) 
at  distance  h  -f  1  from  cell(x).  If  \VX  is  the  parent  block  of  ITC,  then  \VC  is 
a  child  block  of  Wx.  Each  block  has  2/l+1  child  blocks.  The  topmost  block  of 
IT,  which  contains  the  root  of  IT,  is  called  the  root  block. 

Define  the  top  half  of  a  block  Wx  as  !T[x,/i],  and  define  me  bottom  half 
of  Wx  as  the  remaining  cells  of  the  block.  Note  that  the  top  half  of  the  block 
\VX  is  part  of  the  bottom  half  of  Wp,  its  parent  block,  so  that  the  blocks 
overlap.  Call  the  portion  of  Wx  shared  by  \VP  (i.e.,  the  subtree  U’[x./?])  the 
common  subtree  of  Wx  and  1VP. 

R  precomputes  in  separate  memories  two  ♦ables.  half  and  translate.  We 
explain  later  how  R  uses  these  tables.  Here  we  describe  their  contents  and 
how  they  are  computed.  Let  half(z)  (respectively,  translate(z))  be  the  regis¬ 
ter  in  half  ( respectively,  translate)  at  address  x. 
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Half(z)  contains  [s/2J.  For  2  =  l,...,u/2,  R  stores  2  in  half(2z)  and 
half  {2  z  +  1). 

For  2  =  22/i+1, . . . ,  u,  translate(z)  contains  (2  mod  2,l+1)  +  2h+l ■  R  never 
refers  to  any  register  in  translate  with  address  less  than  22/l+1 .  Translate  is 
computed  as  follows: 

1  :=  2h+ 1 

for  2  =  22/l+1  to  u  do 
translate(z)  i 
1  i  +  1 

if  i  =  22M  then  1  :=  2h+1 

We  now  show  how  R  simulates  the  tree  machine  using  the  cache.  Assume 
the  head  of  T  is  currently  scanning  a  cell  in  block  Wx.  Let  cache{z)  be  the 
register  in  the  cache  with  address  2  and  let  cell(x,2)  be  the  cell  in  \VT  with 
relative  location  2.  For  each  2  =  1,.  . .  ,u,  register  cache{z)  contains  the  bit  in 
cell(x,2);  for  example,  cache(l)  contains  the  contents  of  cell(i,  1)  =  cell(x). 
the  root  of  \VX.  Thus  R  uses  u  registers  of  the  cache,  each  register  containing 
one  bit. 

While  the  head  of  T  remains  in  fFx,  R  keeps  track  of  the  head's  location 
with  the  cache  address  register  in  the  working  memory ,  a  memory  maintained 
by  R  for  storing  information  necessary  for  miscellaneous  tasks.  If  the  cache 
address  register  contains  2,  then  cell(i,2)  is  currently  being  accessed  in  T. 

To  simulate  a  tree  machine  operation  at  cell(i,2),  R  loads  the  contents 
(one  bit)  of  cache(z)  into  AC.  Once  the  contents  are  in  AC ,  R  simulates  one 
step  of  T  by  storing  either  0  or  1  in  cache(z). 
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If  the  head  of  T  moves  to  a  child  of  cell(x,  2),  then  the  new  address  for 
the  cache  address  register,  as  well  as  the  relative  location  of  the  new  block 
cell  being  read,  is  either  2z  or  2z  +  1.  With  one  or  two  additions,  Ft  computes 
this  new  address  and  places  it  in  the  cache  address  register.  When  the  head 
of  T  moves  to  the  parent  of  cell(x,  z),  the  address  of  the  corresponding  cache 
register  is  [2/ 2J  Because  R  has  no  division  operation,  it  accesses  the  proper 
register  of  table  half  to  retrieve  the  new  address  in  cache. 

To  describe  what  happens  when  the  worktape  head  moves  out  of  the 
current  block,  we  first  show  how  the  blocks  are  stored  in  main  memory.  Main 
memory  is  divided  into  pages  consisting  of  2/l+1  +  3  registers  each.  A  page 
corresponds  to  a  visited  block  of  W .  Let  page(x)  be  the  page  representing 
Wx.  Define  the  address  of  a  page  to  be  the  address  of  the  first  register  in 
the  page.  The  first  register  in  page(x)  is  the  contents  register.  For  the  page 
representing  the  root  block,  the  contents  register  contains  the  entire  contents 
of  that  block.  For  every  other  block  Wy,  the  contents  register  contains  the 
contents  of  the  bottom  half  of  Wy.  The  contents  of  cells  in  a  block  are  kept 
in  breadth-first  order;  i.e.,  reading  the  binary  string  in  the  contents  register 
from  left  to  right  is  equivalent  to  reading  the  bottom  half  of  the  block  it 
represents  in  breadth-first  order.  Initially,  all  cells  of  a  block  contain  0,  so 
all  contents  registers  initially  contain  0. 

Following  the  contents  register  is  the  rank  register ,  containing  a  number 
(  between  1  and  2/l+1  indicating  that  Wx  is  the  ith  child  of  its  parent  block. 
The  next  register  is  the  parent  register ,  containing  the  address  of  the  page 
representing  the  parent  block  of  \VX.  The  next  2/l+1  registers  are  the  child 
register?  ...  i  x.  The  mth  child  register  of  page(x)  contains  the  address  of  the 
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page  representing  the  mth  child  block  of  Wx  or  0  if  that  child  block  has  not 
been  visited  (see  Figure  1). 

The  first  page  in  main  memory  corresponds  to  the  root  block.  Blocks 
are  then  stored  in  the  order  in  which  they  are  visited.  The  page  address 
register ,  a  register  in  working  memory,  contains  the  address  of  the  page  in 
main  memory  corresponding  to  the  currently  accessed  block. 

Let  Wx  be  the  currently  accessed  block  and  let  Wp  be  the  parent  block  of 
Wx.  When  the  tree  worktape  head  moves  out  of  Wx  so  that  it  is  positioned  in 
the  middle  of  a  child  block  Wc,  R  makes  the  proper  changes  to  main  memory 
and  load  the  cache  from  the  contents  register  of  page(c). 

In  main  memory,  R  updates  the  contents  registers  of  page(x)  and  page(p). 
To  update  page(x),  R  packs  the  contents  of  the  registers  of  the  cache  which 
correspond  to  the  bottom  half  of  Wx  into  a  single  register  in  working  memory 
(call  it  the  transfer  register ,  denoted  by  tr ).  Packing  information  in  the  cache 
consists  of  creating  from  the  registers  in  the  cache  one  binary  string  that 
represents  the  bottom  half  of  a  block  (in  the  same  format  as  a  main  memory 
register).  The  pack  operation  is  that  used  by  Katajainen  et  al.  (19SS).  R 
then  copies  tr  into  the  contents  register  of  page(x)  via  AC  (see  Figure  2). 

Updating  page(p)  consists  of  changing  the  bits  of  its  contents  register  cor¬ 
responding  to  the  common  subtree  of  \VX  and  Wp.  R  first  saves  the  contents 
of  the  cache  that  encode  the  common  subtree  of  \VX  and  IUC  in  a  portion 
of  working  memory,  since  this  information  is  needed  in  the  cache  as  the  top 
half  of  Wc.  R  also  saves  the  contents  of  the  cache  that  encode  the  common 
subtree  of  Wx  and  Wp.  R  then  loads  the  contents  register  of  page(p)  into 
tr  and  unpacks  the  contents  into  the  cache.  The  bits  in  working  memory 


11 


TREE  WORKTAPE 


Wx  depth  (j)(h  +  1) 


page(x) 


page(c) 


contents 

rank 

parent 

child  1 


contents 

rank 

parent 


Figure  1:  Worktape  W  (head  moves  from  Wx  to  ) 


2 


Figure  2:  Updating  page(p)  in  main  memory 

corresponding  to  the  common  subtree  of  Wx  and  Wp  are  then  written  into 
their  proper  locations  in  the  portion  of  the  cache  representing  the  bottom 
half  of  Wp.  R  then  packs  the  contents  of  the  cache  into  tr  and  copies  tr  into 
the  contents  register  of  page(p). 

R  then  determines  whether  Wc  has  been  visited  before  by  checking  the 
contents  of  the  child  register  of  page(x)  corresponding  to  Wc.  If  the  child 
register  contains  a  valid  (i.e.,  nonzero)  address,  then  R  uses  that  address  to 
access  page(c).  R  then  loads  the  contents  register  of  page(c)  into  the  cache. 
This  action  is  similar  to  the  manipulation  of  page(p)  discussed  above.  R 
loads  the  contents  of  the  common  subtree  of  Wx  and  Wc  saved  in  working 
memory  into  the  registers  of  the  cache  representing  the  top  half  of  the  block. 

If  the  child  register  of  page(x)  contains  0,  then  R  allocates  a  new  page  to 
maintain  the  information  on  We. 

R  modifies  the  page  address  register  to  reflect,  the  fact  that  the  worktape 
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head  is  now  scanning  block  Wc.  The  address  currently  in  this  register  is  that 
of  page(x).  R  writes  the  address  of  page(c)  in  main  memory  to  the  page 
address  register.  R  determines  from  the  cache  address  register  the  quantity 
i  such  that  Wc  is  the  £th  child  of  Wx.  Then  by  accessing  the  £th  child  register 
of  page(x)  in  the  main  memory,  R  can  determine  the  address  of  page(c). 

To  modify  the  cache  address  register  to  reflect  the  relative  location  of 
the  head  within  block  Wc,  R  first  translates  the  relative  location  of  the  leaf 
cell(x,z)  in  Wx  to  its  relative  location  in  Wc.  Since  leaf  cell (x,z)  in  \VX  is 
the  same  as  cell((c,  z  mod  2/l+1)  -f  2^+1)  in  Wc,  R  uses  the  table  translate 
described  above.  Using  one  or  two  additions,  R  then  calculates  the  relative 
location  in  Wc  of  this  cell’s  left  or  right  child,  depending  on  which  branch 
the  worktape  head  used  to  exit  Wx.  R  then  writes  this  new  relative  location 
into  the  cache  address  register. 

A  similar  sequence  of  operations  occurs  if  the  worktape  head  moves  out  of 
a  block  (and  further)  into  its  parent  block  instead  of  into  a  child  block.  Then 
R  uses  the  parent  register  to  determine  the  address  of  the  page  representing 
the  parent  block,  and  R  uses  the  rank  register  to  determine  the  relative 
location  of  the  worktape  head  within  the  parent  block. 

If  R  does  not  know  the  input  size  n  ahead  of  time,  then  we  let  R  adopt  an 
incremental  technique  of  Galil  (1976).  R  begins  by  assuming  that  n  =  2.  If 
the  input  head  reads  a  third  symbol,  then  R  begins  again  with  n  =  4,  but  it 
does  not  output  symbols  already  printed.  In  general,  R  assumes  n  =  2k  until 
it  reads  the  ( 2k  +  l)th  symbol,  at  which  time  R  starts  over  with  n  =  2i+1. 

The  values  of  u  and  h  depend  on  the  value  of  n;  therefore  u  and  h  are 
recomputed  each  time  the  value  of  n  is  doubled. 
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Let  the  actual  simulation  (without  the  incremental  method)  run  in  time 
t'(n),  where  t'(n)  >  n.  It  can  be  shown  by  induction  that  the  simulation 
with  the  incremental  method  runs  in  time  at  most  k't'(n),  for  some  constant 
k'  >  0. 

By  evaluating  the  cost  of  the  simulation  on  a  log-cost  RAM,  we  derive 
the  following  result. 

Theorem  3.1  A  tree  machine  running  in  time  t(n)  can  be  simulated  on-line 
by  a  log-cost  RAM  running  in  time  0((t(n)  log  t(n))/  log  log  t(n)). 

Proof.  Because  the  blocks  have  height  2h  +  1  and  overlap  by  height  h  4-1, 
each  time  the  worktape  head  moves  out  of  a  block,  it  is  exactly  in  the  middle 
of  another  block;  i.e.,  it  will  take  at  least  h'  —  h  +  1  steps  before  it  exits 
this  new  block.  Since  the  tree  machine  computation  has  at  most  t  steps,  the 
work  of  updating  main  memory  from  cache  (packing),  loading  a  new  block 
to  cache  (unpacking),  and  directly  simulating  h'  steps  is  performed  at  most 
t/h'  times. 

Updating  main  memory  and  loading  a  new  block  in  cache  involve  the  pack 
and  unpack  operations  and  a  constant  number  of  accesses  to  main  memory. 
Registers  in  main  memory  have  addresses  no  larger  than  (t/h')(2h+l  +  3). 
Thus  accesses  to  main  memory  take  time  0(log<  -f  h). 

By  Lemma  2.1,  the  time  for  the  pack  and  unpack  operations  is  0(u  log  u). 
By  Lemma  2.2,  the  time  to  create  the  tables  necessary  for  these  operations 
is  0{u2u).  The  time  to  compute  tables  half  and  translate  is  0(u). 

Simulating  one  step  of  the  tree  machine  consists  of  a  constant  number  of 
accesses  to  cache,  taking  time  O(logu).  Thus  simulating  h'  steps  takes  time 


15 


O(h'\ogu). 

The  total  time  required  for  R,  then,  is 

{t/h')(0{ log  t  +  h)  +  0{u  log  u)  +  0{h'  log  u))  +  0(u2u). 

Since  h  =  O(logu),  the  total  time  is 

0(((t  log  t )/  log  u)  +  tu  +  t  log  u  +  u2u). 

Choose  h  so  that  u  =  (log  t)/  log  log  t.  Then  the  total  time  for  the  simu¬ 
lation  is  0((t\ogt)/ loglogt).  □ 

For  unit-cost  RAMs,  we  have  a  much  stronger  result: 

Theorem  3.2  A  tree  machine  can  be  simulated  by  a  unit-cost  RAM  in  real¬ 
time. 

Proof  sketch.  We  design  a  unit-cost  RAM  R  simulate  tree  machine  T 
with  worktape  W .  R  has  a  contents  memory ,  a  parent  memory ,  and  several 
working  registers.  Let  contents(x )  (respectively,  parent(x))  be  the  register 
with  address  x  in  the  contents  (respectively,  parent)  memory.  Contents(x) 
at  address  x  contains  the  contents  of  cell(x)  at  location  x  in  the  worktape  of 
T.  If  cell(x)  is  visited  by  T,  then  parent(x)  contains  the  worktape  location 
of  the  parent  of  cell(x).  The  working  registers  are  used  as  temporary  storage 
and  to  keep  track  of  which  cell  is  currently  accessed  by  T. 

R  simulates  one  step  of  T  with  a  constant  number  of  accesses  to  the  two 
memories  and  the  working  registers.  For  example,  if  the  head  moves  from 
cell(x)  to  a  child  of  cell(x),  then  R  computes  location  2x  of  the  left  child  or 
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2x  +  1  of  the  right  child  with  one  or  two  additions  and  stores  x  in  parent(2x) 
or  parent(2x  +1).  Thus  to  simulate  t  steps  of  T  takes  O(t)  time  on  T.  □ 


An  immediate  consequence  of  Loui’s  upper  bound  on  the  simulation  of  a 
tree  machine  by  a  multidimensional  TM  is  the  following: 

Theorem  3.3  (Loui,  1983)  A  log-cost  RAM  running  in  time  t(n)  can  be 
simulated  on-line  by  a  multihead  d-dimensional  Turing  machine  running  in 
time  0(t(n)1+l/d /  log  t(n)). 


Using  our  simulation  of  a  tree  machine  by  a  log-cost  RAM,  we  obtain  a 
nonlinear  lower  bound  for  simulating  a  RAM  by  a  multidimensional  Turing 
machine: 

Corollary  3.4  There  is  a  log-cost  RAM  R  running  in  time  t(n )  such  that 
for  any  multihead  d-dimensional  Turing  machine  S ,  S  simulates  R  on-line 
in  time  0((i(n)1+1/<*(lOg  }0g  t(n))1+1/<i)/(l0g  f(n))2+1/d). 

Proof.  Let  T  be  the  tree  machine  described  in  the  lower  bound  proof 
of  Loui(1983).  Let  R  be  the  RAM  that  uses  the  method  in  the  proof 
of  Theorem  3.1  to  simulate  tree  machine  T.  T  runs  in  real  time,  so  by 
Theorem  3.1,  R  runs  in  time  t(n)  =  0((n  log  n)j  log  log  n).  Now  assume 
there  is  a  d-dimensional  Turing  machine  that  simulates  R  on-line  in  time 
o((<1+1/,<i(log log  <)1+U<i)/(log  t)2+lld).  We  thus  have  an  on-line  simulation  of 
tree  machine  T  running  in  time  n  by  a  d-dimensional  Turing  machine  running 
in  time  o(nl+i^d/  log  n).  But  we  know  from  Loui  (1983)  that  the  lower  bound 
on  such  a  simulation  is  il(nl+l^d/  log  n);  hence  we  have  a  contradiction.  □ 
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3.2  Lower  Bound 


We  now  show  that  the  time  bound  of  Theorem  3.1  is  optimal  within  a  con¬ 
stant  factor.  We  begin  with  an  overview  of  Kolmogorov  complexity,  which 
we  use  to  prove  the  lower  bound. 

For  strings  <r,  r  in  {0,1}’,  let  K(cr)  be  the  Kolmogorov  complexity  of  a 
with  respect  to  a  universal  Turing  machine  U.  Define  K(cr)  to  be  the  length  of 
ft  where  (3  is  the  shortest  binary  string  such  that  U((3)  equals  a.  Informally, 
K(ct)  is  the  length  of  the  shortest  binary  description  of  a. 

We  say  a  string  a  is  incompressible  if  K(cr)  >  \cr\.  Note  that  for  all  n 
there  are  2”  binary  strings  of  length  n,  but  there  are  only  2n  —  1  strings  of 
length  less  than  n.  Thus  for  all  n,  there  is  at  least  one  incompressible  string 
of  length  n. 

A  useful  concept  in  Kolmogorov  complexity  is  the  self-delimiting  string. 
For  natural  number  n,  let  bin(n)  be  the  binary  representation  of  n  without 
leading  0’s.  For  binary  string  in,  let  w  be  the  string  resulting  from  placing  a 
0  between  each  pair  of  adjacent  bits  in  w  and  adding  a  1  to  the  end.  Thus 
ITU  =  101001.  We  call  the  string  6in(ju;|)ix;  the  self-delimiting  version  of 
w.  The  self-delimiting  version  of  w  has  length  Ju>)  +  2[log(|u?|  -f  1)}.  When 
we  concatenate  several  binary  string  segments  of  differing  lengths,  we  can 
use  self-delimiting  versions  of  the  strings  so  that  we  can  determine  where 
one  string  ends  and  the  next  string  begins  with  little  additional  cost  in  the 
length  of  the  concatenated  string.  Note  that  in  such  a  concatenation  it  is 
not  necessary  to  use  a  self-delimiting  version  of  the  last  string  segment. 

Kolmogorov  complexity  has  recently  gained  popularity  as  a  method  for 
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proving  lower  bounds.  Li  and  Vitanyi  (1988)  provide  a  thorough  summary 
of  lower  bound  (and  other  complexity-related)  results  obtained  using  Kol¬ 
mogorov  complexity. 

Theorem  3.5  There  is  a  tree  machine  T  running  in  time  n  such  that  for  any 
log-cost  RAM  R,  R  requires  time  t(n)  =  Q((n  log  n)/  log  log  n)  to  simulate  T 
on-line. 

Proof.  For  simplicity,  we  omit  floors  and  ceilings  in  this  proof. 

Tree  machine  T  has  one  tree  worktape  and  operates  in  real  time.  T's 
input  alphabet  is  a  set  of  commands  of  the  form  (e,tp),  where  e  £  (0,  1,?} 
and  ip  indicates  whether  the  worktape  head  moves  to  a  child  or  parent  of  the 
current  cell  or  remains  at  the  current  cell.  Suppose  T  is  in  a  configuration 
in  which  the  cell  x  at  which  the  worktape  head  is  located  contains  e'.  On 
input  (e,0),  machine  T  writes  e'  on  its  output  tape,  and  the  worktape  head 
writes  e  on  cell  x  if  e  €  {0, 1},  but  it  writes  e'  (the  current  contents  of  x)  on 
x  if  e  =?.  At  the  end  of  the  step  the  worktape  head  moves  according  to  xp. 
For  every  n  that  is  a  sufficiently  large  power  of  2,  we  construct  a  series  of  n 
tree  commands  for  which  R  requires  time  ff((n  log  n)/  log  log  n).  As  in  (Loui, 
1983),  the  string  of  tree  commands  is  divided  into  a  filling  part  of  length  nj 2 
and  a  query  part  of  length  n/2. 

Let  W  be  the  worktape  of  T,  and  let  Xo  be  the  root  of  W.  Let  d  = 
log(n/8).  Denote  the  complete  subtree  of  W  of  height  d  whose  root  is  x0  by 
W4.  Let  N  —  n/8.  We  consider  the  complexity  of  the  simulation  in  terms  of 
N. 
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We  fill  Wj  with  an  incompressible  string  r  of  length  2N  —  1  such  that  r 
can  be  retrieved  by  a  depth-first  traversal  of  Wj.  This  is  the  filling  part,  for 
which  T  takes  time  AN  (=  n/ 2)). 

The  query  part  consists  of  a  series  of  questions.  A  question  is  a  string 
of  2  log  N  tree  commands  that  causes  the  worktape  head  to  move  from  the 
root  Xo  of  the  tree  worktape  to  a  cell  at  depth  d  and  back  to  Xo  without 
changing  the  contents  of  the  worktape.  As  the  head  visits  each  cell  during 
a  question,  T  outputs  the  contents  of  that  cell.  T  processes  ;V/(4logAr) 
questions  Qx ,  Q2, . . .  during  the  query  part.  We  show  that  after  each  question 
Qj,  there  is  a  question  QJ+X  such  that  R  takes  time  fl((log2  N)/ log  log  N) 
to  process  QJ+ 1,  and  Theorem  3.5  follows. 

Assume  that  R  has  just  processed  question  Qj.  Let  P(N)  be  the  max¬ 
imum  time  necessary  to  process  any  possible  next  question.  We  show  that 
some  next  question  takes  time  O((log2  N)/ log  P).  Consequently,  by  defini¬ 
tion,  P  =  fl((log2  N)/  log  P);  thus  P  =  O((log2  N)/  log  log  N). 

We  first  determine  the  total  time  t  required  for  R  to  process  all  possible 
next  questions. 

Divide  worktape  IV  into  5  =  (log  N)/(2  log  P)  sections,  each  of  height 
21ogP.  For  5  =  0, 1,.  ..,5  —  1,  there  are  P2s+2  exit  points  ( bottom  cells )  in 
section  s.  We  refer  to  any  initial  segment  of  a  question  as  a  partial  question 
and  the  portion  of  the  question  that  is  processed  while  the  worktape  head 
is  in  one  section  as  a  subquestion  (see  Figure  3).  To  compute  t,  we  compute 
for  s  =  0, 1, . . . ,  S  —  1  the  total  time  t,  required  for  R  to  process  all  possible 
subquestions  in  section  s.  Since  the  depth  of  Wj  is  log  N,  there  are  N  possible 
next  questions.  Each  of  the  P2s+2  bottom  cells  of  section  s  is  visited  during 
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partial  question 
through  section  5  —  1 


Figure  3:  Processing  section  s  of  worktape  XV 
N/P 23+2  of  these  questions. 

Let  a,  be  the  string  defined  by  the  contents  of  the  bottom  cells  of  section 
s,  from  left  to  right;  clearly,  |a,|  =  P23+2. 

Lemma  3.6  The  string  <7,  is  incompressible  up  to  a  term  of  0(s  log  P);  he., 
>  1(7,1  -  0(s  log  P). 

Proof.  The  incompressible  string  r,  which  gives  the  contents  of  IV’,  can 
be  specified  by  a  string  composed  of  the  following  segments: 

1.  a  self-delimiting  string  encoding  this  discussion  (0(1)  bits) 
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2.  a  self-delimiting  version  of  a  binary  string  of  length  A  (ec,)  that  specifies 
<7,  (A'(<t,)  +  0{s  log  P)  bits) 

3.  self-delimiting  versions  of  the  values  of  s  and  P  (O(logs)  +  O(logP) 
bits) 

4.  a  string  specifying  the  bits  in  r  but  not  in  a ,  (2jV  —  1  —  P2,+2  bits). 

Thus  I<{t)  <  K{( 7.)  +  (2iV  -  1  -  P2s+2)  +  0(s  log  P).  But  A'(r)  >  2Ar  -  1; 
therefore,  I\{cr3)  >  P2j+2  —  0(s  log  P).  □  Lemma  3.6 


Lemma  3.7  If  i  >  1  t/ien  ]TMogi  >  ( 1  /2 )£  log  T. 


i=i 


Proof.  For  all  i  such  that  1  <  i  <  evidently  (i  —  1)(£  —  i)  >  0;  hence 
i((  —  i  +  1)  >  l.  Consequently 

Y  ^g 1  =  (1/2)I](logi  +  log(£-z  + 1)) 

1=1  i=i 

=  ( 1/2)  ElogW -i  +  D) 

1=1 

>  d/2)X)log£ 

1=1 


=  ( 1  /2)£  log  t. 


□  Lemma  3.7 


Lemma  3.8  For  s  —  1,2, ...  ,5  —  l,  the  maximum  number  of  registers  ac¬ 
cessed  during  the  processing  of  all  partial  questions  through  section  s  —  1  is 

4P2s+,/logP. 
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Proof.  Let  C  —  4P/logP.  By  Lemma  3.7,  for  P  sufficiently  large, 
Ylf-\  logi  >  P ■  The  processing  of  each  partial  question  through  section  ,s  —  1 
could  involve  no  more  than  C  registers;  otherwise,  because  of  the  total  cost 
of  addresses  of  registers,  R  would  exceed  time  P  for  some  next  question. 
There  are  P2s  different  partial  questions  possible  through  section  s  —  1,  so 
there  are  no  more  than  4P2j+1/  log  P  registers  accessed  for  all  possible  partial 
questions.  □  Lemma  3.8 

Let  us  consider  a  particular  section  s.  Let  rt,  r2, . . . ,  rm  be  the  registers, 
in  order  of  increasing  address,  used  to  process  tree  commands  in  section  s. 
The  address  of  r,  is  at  least  i.  For  1  <  i  <  m,  let  .V,  be  the  set  of  bottom  cells 
x  of  section  s  such  that  r,  is  accessed  while  the  worktape  head  is  visiting  some 
cell  y  in  section  s,  and  either  y  is  an  ancestor  of  x  or  y  =  x  (see  Figure  3). 
We  say  that  r,  operates  on  the  bottom  cells  in  Xt. 

To  compute  a  lower  bound  on  t,,  we  assess  the  contribution  to  ts  of 
accessing  register  r,.  For  1  <  i  <  m,  the  total  access  time  for  register  r, 
in  section  s  is  at  least  the  product  of  logz  (since  the  address  of  r,  is  at 
least  i),  |.V,|  (the  number  of  bottom  cells  that  r,  operates  on),  and  N/P2a+2 
(the  number  of  questions  during  which  one  of  these  bottom  cells  is  visited). 
Totalling  the  time  incurred  by  access  to  each  register  yields: 

m 

>Dl°8  0|.Vil [N/Pw).  (1) 

1=1 

Using  Lemma  3  10  below,  we  can  determine  a  lower  bound  for  ts,  but  we 
first  introduce  the  following  technical  lemma. 
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Lemma  3.9  (Loui,  1984a  [Section  4])  Let  J  and  M  be  integers  such  that 
M  >  J .  .4  sorted  J -member  subset  of  {0, . . . ,  A/}  can  be  represented  with  no 
more  than  2J  log(A//  J)  +  4  J  +  2  bits. 

Let  h  =  (l/7)P2i+1. 

m 

Lemma  3.10  £  |X,|  >  (l/23)P2s+2. 

i=/l 

Proof.  Assume  that  the  conclusion  is  false.  Then  rq, . . . ,  r^-\  operate  on 
at  least  (22/23 )P2s+1 2 3  bottom  cells  in  section  s.  We  can  specify  the  string 
a,  as  follows:  we  obtain  the  bits  of  A/,, . . . ,  Xm  explicitly.  We  obtain  the 
other  bits  of  cr,  by  simulating  R  on  each  partial  question  to  a  bottom  cell  of 

m 

section  5  not  in  [J  Xk ■  On  each  such  partial  question,  R  uses  only  registers 

k=h 

r1,...,r/,_1  and  registers  accessed  in  sections  l,...,s—  1.  Thus  er,  can  be 
specified  with  a  string  composed  of  the  following  segments: 

1.  a  self-delimiting  string  encoding  the  program  of  R  and  this  discussion 
(0(1)  bits) 

2.  self-delimiting  versions  of  the  addresses  and  initial  contents  of  registers 
accessed  in  sections  1, . . .  ,s  —  1  (at  most  8 P2s+2/  log  P  +  0(s  log  P)  bits 
-  by  Lemma  3.8,  at  most  4P2,5+2/logP  registers  are  required,  and  for 
each  register,  the  contents  and  the  address  could  each  require  P  bits.) 

3.  self-delimiting  versions  of  the  addresses  and  initial  contents  of  rj, . . . ,  rh_i 
((2/7)P2j+2  +  0(s  log  P)  bits) 


4.  a  string  specifying  positions  of  ceils  in  X *  for  k  >  h  (we  use  Lemma  3.9 
with  J  =  (1/23 )P2a+2  and  M  =  P2a+2-  this  requires  at  most  ( 14/23 )P2,+ 
bits.  The  encoding  used  to  achieve  Lemma  3.9  is  such  that  the  begin¬ 
ning  and  end  of  this  string  can  easily  be  determined.) 

5.  a  string  specifying  the  contents  of  cells  in  X*.  for  k  >  h  (at  most 
(1/23 )P2a+2  bits). 

This  means  that  the  number  of  bits  needed  to  specify  a,  is  at  most 
(151/161 ) jP2-5-*-2  -|_  0(P23+2 /  log  P)  <  P2s+2  —  0(s  log  P)  for  sufficiently  large 
P.  Thus  we  have  a  contradiction  of  Lemma  3.6.  □  Lemma  3.10 

Thus  we  have: 

m 

t,  >  ]T((l°gi)|Xt|(Ay/:,2*+2)  (Inequality  1) 

«= l 

>  f;((iogoiA',i(/v/F2-«)) 

x=/i 

m 

>(N/P™)(  logft)£|.V,| 

t=/l 

>  (N/ jP2a+2)(log  h)(l /23)P2,+2  (Lemma  3.10) 

>  (l/23)X((2s  +  1)  log  P  —  log  7)  (definition  of  h) 

>  (l/23)Ns  log  P. 

Now  sum  t,  over  all  s  to  compute  a  lower  bound  for  <,  the  total  time 

required  for  R  to  process  all  possible  next  questions: 
s-i 

i  = 

3  =  0 

>  y)((l/23)A^5 log  P) 
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>  (l/23)Ar(log  P)((log2  N)/( 4  log2  P)  -  O((log  N)/ log  P)) 

>  (l/92)((iV  log2  N)/  log  P  ~  0( log  N)). 

Since  there  are  N  questions,  we  divide  t  by  N  to  derive  the  average  time 
needed  by  R  to  process  the  next  question,  fl((log2  N)/  log  P).  Some  next 
question  must  require  time  greater  than  or  equal  to  this  average  time.  Since 
P  is  the  maximum  time  for  some  next  question,  P  >  O((log2  N)/ log  P)\ 
hence,  P  =  fi((log2  N)/  log  log  ./V). 

Thus  for  each  question  we  can  choose  a  next  question  Q:+\  that 
takes  time  fl((log2  N)/  log  log  N).  Since  the  query  part  has  jV/(21ogiV) 
questions,  our  choice  of  questions  means  that  the  query  part  takes  time 
t  -  (iV/(2  log  7V))fl((log2  N)/  log  log  N))  =  Cl((N  log  N)/  log  log  N).  The 
entire  simulation  takes  at  least  time  t  Since  N  =  nj 8,  the  lower  bound 
holds  for  n  as  well.  □  Theorem  3.5 

Because  the  lower  bound  proof  considers  only  the  time  involved  in  access¬ 
ing  registers,  the  lower  bound  holds  for  RAMs  with  more  powerful  instruc¬ 
tions,  such  as  boolean  operations  or  multiplication. 

4  Simulation  of  a  Multidimensional  Turing 
Machine 

By  composing  our  simulation  in  subsection  3.1  of  a  tree  machine  by  a  log- 
cost  RAM  with  Reischuk’s  (1982)  simulation  of  a  d-dimensional  Turing  ma¬ 
chine  by  a  tree  machine,  we  obtain  an  on-line  simulation  of  a  d-dimensional 


26 


Turing  machine  of  time  complexity  i  by  a  log-cost  RAM  running  in  time 
0((5dl°s‘  Hlogt)/  log  log  t).  But  we  can  improve  this  upper  bound  with  a 
direct  simulation. 

Theorem  4.1  A  d-dimensional  Turing  machine  running  in  time  t(n)  can  be 
simulated  on-line  by  a  log-cost  RAM  running  in  time 
0(t(n){ log  t(n))1-1/d(log  log  t{n))l/d). 

Proof  sketch.  We  design  a  log-cost  RAM  R  that  simulates  d-dimensional 
Turing  machine  M .  For  simplicity,  assume  M  has  one  worktape;  our  results 
generalize  to  d-dimensional  Turing  machines  with  more  than  one  worktape. 
Let  s  =  ((log  t)j  log  log  t)xld.  Partition  the  worktape  of  M  into  d-dimensional 
cubes  (call  them  boxes)  with  side  length  s.  Let  corner(i)  be  the  cell  in  box  i 
with  the  coordinates  whose  components  are  the  smallest. 

For  box  i,  if  corner(i )  =  . . .  ,id),  let  index(i)  =  idtd~l  +  id-\td~ 2  + 

. . .  +  ij.  R  stores  the  contents  of  box  i  in  the  register  in  main  memory  with 
address  index(i).  Step-by-step  simulation  is  carried  out  in  the  cache.  R  con¬ 
ducts  the  simulation  in  t/s  phases,  each  of  s  steps  of  M.  For  each  phase:  R 
unpacks  the  contents  of  3d  boxes  that  are  within  distance  s  of  the  worktape 
head  (the  head  remains  within  these  boxes  during  the  phase);  R  simulates 
M  for  s  steps;  and  R  packs  the  contents  of  the  cache  back  to  main  mem¬ 
ory.  Using  precomputed  "alues  of  t,  t2, . . . ,  td~1,  R  quickly  computes  index(i') 
from  mdex(i)  when  box  i'  is  adjacent  to  box  i.  For  each  phase,  R  takes  time 
0(\og  t)  to  access  main  memory,  O(logt)  to  compute  the  address  of  registers 
in  main  memory  representing  the  new  blocks  needed  in  cache,  0 (slogs)  to 
simulate  s  steps  in  the  cache,  and  C^sMogs)  to  pack  and  unpack  the  appro- 
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priate  registers  (Lemma  2.1).  Thus  the  total  time  for  the  simulation  is: 

{t/s)(0(logt)  +  0(s  logs)  +  0(sd  log  s)) 

-  0({(t  log  t)/s)  -f  #sd_1  logs) 

=  O(t(log0l'1/,f(loglog01/rf).a 

Once  again,  the  result  for  unit-cost  RAMs  is  much  stronger: 

Theorem  4.2  A  multidimensional  Turing  machine  can  be  simulated  by  a 
unit-cost  RAM  in  real-time. 

Proof.  Schonhage  (1980)  showed  that  a  unit-cost  successor  RAM  can 
simulate  a  multidimensional  Turing  machine  in  real-time.  It  follows  that  a 
unit-cost  RAM  with  addition  and  subtraction  can  simulate  a  multidimen¬ 
sional  Turing  machine  in  real-time  as  well.  □ 

5  Conclusions 

Because  the  log-cost  RAM  is  considered  a  “standard”  among  models  of  com¬ 
putation,  it  is  important  to  determine  its  relationships  to  other  models.  Here 
we  have  shown  an  optimal  on-line  relationship  between  log-cost  RAMs  and 
tree  machines.  We  have  constructed  an  analogous  efficient  simulation  of  mul¬ 
tidimensional  Turing  machines  by  log-cost  RAMs.  We  hope  that  this  work 
will  lead  to  further  study  of  relationships  between  other  models  of  computa¬ 
tion. 

Some  further  areas  of  research  include: 
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1.  finding  an  off-line  simulation  that  is  faster  than  our  on-line  simulation 
of  a  tree  machine  by  a  log-cost  RAM. 

2.  finding  an  optimal  simulation  of  a  pointer  machine  (Schonhage,  1980) 
by  a  log-cost  RAM. 

3.  finding  an  optimal  simulation  of  a  unit-cost  RAM  by  a  log-cost  RAM. 
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